Edit Distance for XML Information Retrieval: Some Experiments on the Datacentric Track of INEX 2011

نویسندگان

  • Cyril Laitang
  • Karen Pinel-Sauvagnat
  • Mohand Boughanem
چکیده

In this paper we present our structured information retrieval model based on subgraphs similarity. Our approach combines a content propagation technique which handles sibling relationships with a document query matching process on structure. The latter is based on tree edit distance (TED) which is the minimum set of insert, delete, and replace operations to turn one tree to another. As the effectiveness of TED relies both on the input tree and the edit costs, we experimented various subtree extraction techniques as well as different costs based on the DTD associated to the Datacentric collection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Coûts de distance d'édition pour la Recherche d'Information XML

Structured information retrieval (SIR) on XML documents allows to retrieve focused parts of documents that match the user needs. These needs can be expressed throught content and structured queries, that as well as XML documents can be represented as trees. Our approach uses these trees through tree edit distance to estimate the relevance of XML elements. Tree edit distance is the minimum set o...

متن کامل

DTD Based Costs for Tree-Edit Distance in Structured Information Retrieval

In this paper we present a Structured Information Retrieval (SIR) model based on graph matching. Our approach combines content propagation, which handles sibling relationships, with a document-query structure matching process. The latter is based on Tree-Edit Distance (TED) which is the minimum set of insert, delete, and replace operations to turn one tree to another. To our knowledge this algo...

متن کامل

From Focused Elements to Snippets A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Supraja Nagalla IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

Information Retrieval is a field of computing which traditionally deals with searching a large collection of documents and retrieving documents based on their similarity to the query. INEX [10] provides a platform (e.g., document collection, queries and uniform evaluation metrics) for the development and evaluation of retrieval algorithms for XML documents. The focus of INEX is to reduce the gr...

متن کامل

Overview of INEX 2004

The widespread use of the eXtensible Markup Language (XML) in scientific data repositories, digital libraries and on the web, brought about an explosion in the development of XML retrieval systems. These systems exploit the logical structure of documents, which is explicitly represented by the XML markup: instead of whole documents, only components thereof (the so-called XML elements) are retri...

متن کامل

The Interactive Track at INEX 2005

In its second year, the Interactive Track at INEX focused on addressing some fundamental issues of interactive XML retrieval: is element retrieval useful for searchers, what granularity of elements do searchers find more useful, what applications for element retrieval can be viable in interactive environments, etc.. In addition, the track also expanded by offering an alternative document collec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011